Benchmarking Unsupervised Outlier Detection with Realistic Synthetic Data

نویسندگان

چکیده

Benchmarking unsupervised outlier detection is difficult. Outliers are rare, and existing benchmark data contains outliers with various unknown characteristics. Fully synthetic usually consists of regular instances clear characteristics thus allows for a more meaningful evaluation methods in principle. Nonetheless, there have only been few attempts to include benchmarks detection. This might be due the imprecise notion or difficulty arrive at good coverage different domains data. In this work, we propose generic process generation datasets such benchmarking. The core idea reconstruct from real-world while generating so that they exhibit insightful We describe benchmarking detection, as sketched far. then three instantiations generate specific characteristics, like local outliers. To validate our process, perform state-of-the-art carry out experiments study quality reconstructed way. Next showcasing workflow, confirms usefulness proposed process. particular, yields close ones real Summing up, new practical

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Outlier Detection with Uncertain Data

In recent years, many new techniques have been developed for mining and managing uncertain data. This is because of the new ways of collecting data which has resulted in enormous amounts of inconsistent or missing data. Such data is often remodeled in the form of uncertain data. In this paper, we will examine the problem of outlier detection with uncertain data sets. The outlier detection probl...

متن کامل

Multivariate outlier detection with compositional data

Multivariate outlier detection is usually based on Mahalanobis distances, by plugging in robust estimates of location and covariance. For compositional data, carrying only relative information, a special transformation needs to be consulted in order to be able to work in the appropriate geometry. The effect of the transformation is discussed in this contribution. Furthermore, different possibil...

متن کامل

Outlier detection for skewed data

Most outlier detection rules for multivariate data are based on the assumption of elliptical symmetry of the underlying distribution. We propose an outlier detection method which does not need the assumption of symmetry and does not rely on visual inspection. Our method is a generalization of the Stahel-Donoho outlyingness. The latter approach assigns to each observation a measure of outlyingne...

متن کامل

Outlier Detection in Multivariate Data

The objective of this research is detection of outliers in multivariate data employing various distance measure, particularly using robust regression diagnosis technique. Several classical outlier identification methods are based on the sample mean and covariance matrix in general. But they do not always yield better result, as they themselves are affected by the outliers. Sometimes one outlier...

متن کامل

Outlier detection in astronomical data

Astronomical data sets have experienced an unprecedented and continuing growth in the volume, quality, and complexity over the past few years, driven by the advances in telescope, detector, and computer technology. Like many other fields, astronomy has become a very data rich science. Information content measured in multiple Terabytes, and even larger, multi Petabyte data sets are on the horizo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Knowledge Discovery From Data

سال: 2021

ISSN: ['1556-472X', '1556-4681']

DOI: https://doi.org/10.1145/3441453